Developer Blog
Profiling the AI Performance Boost in OptiX 5
By Vincent Brisebois and Ankit Patel, NVIDIA
OptiX 5.0 introduces a new post-processing feature to denoise images. This denoiser is based on a paper published by NVIDIA research “Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder”. It uses GPU-accelerated artificial intelligence to dramatically reduce the time to render a high fidelity image that is visually noiseless. To understand the impact of this capability on rendering time for interactive use, we measured the performance gains from an AI-accelerated solution.
Hardware
We conducted five rendering tests on two systems. The first four benchmarks were performed on an industry standard 1U server equipped with dual Intel(R) XEON(R) E5-2699 v4 CPUs running at 2.20GHz. We ran the benchmark on the CPUs, then added an NVIDIA Tesla V100 GPU and repeated the tests with AI-accelerated denoising on and off.
The final benchmark was run on an NVIDIA DGX Station, which has 4 x NVIDIA Tesla V100 GPUs and the Tensor Core architecture.
Software & Scene
For our performance benchmark we made use of an in-house OptiX based sample renderer and the newly released Amazon Lumberyard Bistro scene available through the Open Research Content Archive (ORCA).
The AI-accelerated denoiser was trained using 1000 CG scenes rendered to 15 different completion levels for a total of 15,000 images using Iray. The training data was given to an auto encoder similar to the one described in the paper. The result is an AI-accelerated denoiser which we will include in the Optix 5.0 SDK. It is important to note that the following tests were performed by our AI-accelerated denoiser on a scene it has never seen and with a new test renderer built on OptiX.
We used SSIM (structural similarity index) as a quantitative measure of similarity between our test image and the perfect image. Our perfect image was rendered using the traditional non AI accelerated algorithm at 1920 x 1080 resolution and 32,768 iterations. We also rendered at 65,536 iterations but that was well past the point of diminishing returns for this scene. Note that in special cases we have seen complex scenes such as the Bank of England project soane benefit from higher iteration counts. For those scenes generating the noise free image using traditional algorithms can require over 65 thousand iterations.
Once we had our baseline (perfect image) we rendered the same resolution image both with and without AI-accelerated denoising for multiple sample iterations and computed the SSIM (Structural Similarity) of each image compared against the baseline image. Below is the resulting table and graph of that data.
What we can observe in the chart above is that at every sample point (iterations) the quality of the denoised image is closer to the final image we rendered at 32,768 iterations. Moreover you will see that as you approach an SSIM score of 0.99 (or 99% similarity) the two approaches converge.
Defining interactive quality
We examined the images and did some qualitative tests with a small sample of creative professionals. The question to answer was at what quality level could they make a creative decision. We tested our subjects with both the bistro scene and a scene from Pixar’s Monsters University and found that images with an SSIM score of approximately 0.90 still contained some artifacts but images with a score of 0.95 were subjectively indistinguishable from the perfect image. For interactive use cases we therefore choose to measure our products on the time it takes to get to an image with a score of 0.93 SSIM when compared to the original perfect image. Note that higher SSIM targets might still be valuable for use cases such as final frame rendering.